An LCS-based string metric
نویسنده
چکیده
These notes presents a string similarity measure which is a metric in the mathematical sense. In particular, the triangle inequality holds for this metric. The metric is based on the longest common subsequence (LCS) measure, and the complexity of any sensible implementation will be no worse than O(n).
منابع مشابه
Sparse LCS Common Substring Alignment
The “Common Substring Alignment” problem is defined as follows. The input consists of a set of strings S1, S2 . . . , Sc , with a common substring appearing at least once in each of them, and a target string T . The goal is to compute similarity of all strings Si with T , without computing the part of the common substring over and over again. In this paper we consider the Common Substring Align...
متن کاملTwo Algorithms for LCS Consecutive Suffix Alignment
The problem of aligning two sequences A and B to determine their similarity is one of the fundamental problems in pattern matching. A challenging, basic variation of the sequence similarity problem is the incremental string comparison problem, denoted Consecutive Suffix Alignment, which is, given two strings A and B, to compute the alignment solution of each suffix of A versus B. Here, we prese...
متن کاملSemi-local longest common subsequences in subquadratic time
For two strings a, b of lengths m, n respectively, the longest common subsequence (LCS) problem consists in comparing a and b by computing the length of their LCS. In this paper, we define a generalisation, called “the all semi-local LCS problem”, where each string is compared against all substrings of the other string, and all prefixes of each string are compared against all suffixes of the ot...
متن کاملSemi-local String Comparison: Algorithmic Techniques and Applications
The longest common subsequence (LCS) problem is a classical problem in computer science. The semi-local LCS problem is a generalisation of the LCS problem, arising naturally in the context of string comparison. Apart from playing an important role in string algorithms, this problem turns out to have surprising connections with computational geometry, algebra, graph theory, as well as applicatio...
متن کاملAll Semi-local Longest Common Subsequences in Subquadratic Time
For two strings a, b of lengths m, n respectively, the longest common subsequence (LCS) problem consists in comparing a and b by computing the length of their LCS. In this paper, we define a generalisation, called “the all semi-local LCS problem”, where each string is compared against all substrings of the other string, and all prefixes of each string are compared against all suffixes of the ot...
متن کامل